AITopics | imitation-projected programmatic reinforcement learning

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Oregon > Multnomah County > Portland (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(7 more...)

Genre: Research Report (0.46)

Industry:

Education (0.46)
Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Imitation-Projected Programmatic Reinforcement Learning

Neural Information Processing SystemsDec-25-2025, 10:33:20 GMT

We study the problem of programmatic reinforcement learning, in which policies are represented as short programs in a symbolic language. Programmatic policies can be more interpretable, generalizable, and amenable to formal verification than neural policies; however, designing rigorous learning approaches for such policies remains a challenge. Our approach to this challenge - a meta-algorithm called PROPEL - is based on three insights. First, we view our learning task as optimization in policy space, modulo the constraint that the desired policy has a programmatic representation, and solve this optimization problem using a form of mirror descent that takes a gradient step into the unconstrained policy space and then projects back onto the constrained space. Second, we view the unconstrained policy space as mixing neural and programmatic representations, which enables employing state-of-the-art deep policy gradient approaches. Third, we cast the projection step as program synthesis via imitation learning, and exploit contemporary combinatorial methods for this task. We present theoretical convergence results for PROPEL and empirically evaluate the approach in three continuous control domains. The experiments show that PROPEL can significantly outperform state-of-the-art approaches for learning programmatic policies.

imitation-projected programmatic reinforcement learning, name change, programmatic representation, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

Add feedback

Reviews: Imitation-Projected Programmatic Reinforcement Learning

Neural Information Processing SystemsJan-23-2025, 23:48:15 GMT

This paper addresses the problem of learning programmatic policies, which are structured in programmatic classes such as programming languages or regression trees. To this end, the paper proposes a "lift-and-project" framework (IPPG) that alternatively (1) optimizes a policy parameterized by a neural network in an unconstrained policy space and (2) projects the learned knowledge to space where the desired policy is constrained with a programmatic representation. Specifically, (1) is achieved by using deep policy gradient methods (e.g. DDPG, TRPO, etc.) and (2) is obtained by synthesizing programs to describe behaviors (program synthesis via imitation learning). The experiments on TORCS (a simulated car racing environment) show that the learned programmatic policies outperform the methods that imitate or distill a pre-trained neural policy and DDPG.

ippg, neural policy, programmatic policy, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.51)

Add feedback

Reviews: Imitation-Projected Programmatic Reinforcement Learning

Neural Information Processing SystemsJan-23-2025, 23:48:03 GMT

While the reviewers generally support acceptance, some concerns remain. We strongly encourage the authors to consider and address the concerns raised by the reviewers, as there remains room for improvement. While the paper is borderline due to these concerns, it falls on the side of acceptance due to the general support and strong support from reviewer 2.

imitation-projected programmatic reinforcement learning

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Imitation-Projected Programmatic Reinforcement Learning

Neural Information Processing SystemsOct-10-2024, 02:29:02 GMT

We study the problem of programmatic reinforcement learning, in which policies are represented as short programs in a symbolic language. Programmatic policies can be more interpretable, generalizable, and amenable to formal verification than neural policies; however, designing rigorous learning approaches for such policies remains a challenge. Our approach to this challenge - a meta-algorithm called PROPEL - is based on three insights. First, we view our learning task as optimization in policy space, modulo the constraint that the desired policy has a programmatic representation, and solve this optimization problem using a form of mirror descent that takes a gradient step into the unconstrained policy space and then projects back onto the constrained space. Second, we view the unconstrained policy space as mixing neural and programmatic representations, which enables employing state-of-the-art deep policy gradient approaches.

imitation-projected programmatic reinforcement learning, programmatic representation, unconstrained policy space, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback

Imitation-Projected Programmatic Reinforcement Learning

Verma, Abhinav, Le, Hoang, Yue, Yisong, Chaudhuri, Swarat

Neural Information Processing SystemsMar-19-2020, 03:16:18 GMT

We study the problem of programmatic reinforcement learning, in which policies are represented as short programs in a symbolic language. Programmatic policies can be more interpretable, generalizable, and amenable to formal verification than neural policies; however, designing rigorous learning approaches for such policies remains a challenge. Our approach to this challenge - a meta-algorithm called PROPEL - is based on three insights. First, we view our learning task as optimization in policy space, modulo the constraint that the desired policy has a programmatic representation, and solve this optimization problem using a form of mirror descent that takes a gradient step into the unconstrained policy space and then projects back onto the constrained space. Second, we view the unconstrained policy space as mixing neural and programmatic representations, which enables employing state-of-the-art deep policy gradient approaches.

imitation-projected programmatic reinforcement learning, programmatic representation, unconstrained policy space, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback